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Abstract 

Wc introduce two abstract theorems that reduce a variety of com- 
plex exponential distributional approximation problems to the con- 
struction of couplings. These are applied to obtain rates of conver- 
gence with respect to the Wasscrstein and Kolmogorov metrics for the 
theorem of Rcnyi on random sums and generalizations of it, hitting 
times for Markov chains, and to obtain a new rate for the classical the- 
orem of Yaglom on the exponential asymptotic behavior of a critical 
Galton- Watson process conditioned on non-extinction. The primary 
tools are an adaptation of Stein's method, Stein couplings, as well as 
the equilibrium distributional transformation from renewal theory. 



1 INTRODUCTION 

The exponential distribution arises as an asymptotic limit in a wide variety of 
settings involving rare events, extremes, wai ting tini e s, and quasi-stationary 
distributions. As discussed in the preface of lAldousI d 19891 ^ ■ the tremendous 
difficulty in obtaining expficit bounds on the error of the exponential ap- 
proximation in more than the most elementary of settings appar ently has 
left a gap in the literature. The classical theorem of Yaglom ( 19471 ) describ- 
ing the asymptotic exponential behavior of a critical Galton- Watson process 
conditioned on non-extinction , for example, has a large li terature of exten- 



sions and embellishments (see Lalley and Zheng ( in presj ) for example) but 



the complex dependencies between off'spring have apparently not previously 
allowed for obtaining explicit error bounds. Stein's method, introduced in 
SteinI (I1972I ). is now a well established method for obtaining explicit bounds 
in distributional approx imation problems in settings with dependencies (see 
Ross and Pekoa (120071 ) for an introduction). Results for the normal and 
Poisson approximation, in particular, are ext ensive but also ar e currently 
very activ ely being further developed: see e.g. IChatteri i3 toO^ and Chen 
and Rollin (|2009|)" 
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While many discrete distributions on the real line have been tackled using 
Stein's me thod, such a s the b i nomia l, geometric, compound Poisson d istri- 
butions inlEhml (|l99lh . IPekozl (|l99fih . iBarbour. Chen, and Lohl (|l992al ;i and 
many other articles, r esults for continuous distri butio ns besides the no rmal 
are far less developed. iDiaconis and Zabelll (jl99ll ) and ISchoutensI (| 200ll ) give 
theoretical material on general classes of probability distribution on th e real 



line and mak e connections with orthogonal polynomials. iLukI (jl994l ) and 
PickettI (1200J) give some applications for F and ap proximations, but the 



Nourdin and Peccati 



metho dology is very specific for their applications. 

give general theorems for F approximation of functionals of Gaussian 

fields. 

There have been a few attempts to apply Stein's method to exponen- 
tial approximation. IWeinbera (j2005l ) sketches a f ew potentia l applications 
but only tackles simple exar nples thoroughly, andlBon only consid- 

ers geometric convolutions. Chatteriee. Fulman. and Rollin ( 20061 ) breaks 
new ground by applying the method to a challenging problem in spectral 
graph theory using exchangeable pairs, but the calculations involved are 
application-specific and far from elementary. In this article, in contrast, we 
develop a general framework that more conveniently reduces a broad variety 
of complex exponential distributional approximation problems to the con- 
struction of couplings. We provide evidence that our approach can be fruit- 
fully applied to non-trivial applications and in settings with dependence — 
settings where Stein's method typically is expected to shine. 

The article is organized as follows. In the Section [2] we present two 
abstract theorems formulated in terms of couplings. We introduce a distri- 
butional transformation (the 'equilibrium distribution' from renewal theory) 
which has not yet been extensively explored using Stein's method. We also 



make use of Stein couplings similar to those introduced in IChen and Rollin 



(Hoot). This is followed by some more concrete coupling constructions that 
can be used along with the theorems. In Section [3] we give applications using 
these couplings to obtain exponential approximation rates for the theorem 
of Renyi on random sums and generalizations of it, hitting times for Markov 
chains, and to the classical theorem of Yaglom on the exponential asymptotic 
behavior of a critical Galton- Watson process conditioned on non-extinction; 
this is the first place this latter result has appeared in the literature. In 
Section m we then give the postponed proofs for the main theorems. 



2 MAIN RESULTS 

In this section we present the framework in abstract form that will subse- 
quently be used in concrete applications in Section O This framework is 
comprised of two approaches that we will describe here and then prove in 
Section HI 

Let us first define the probability metrics used in this article. Define the 
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sets of test functions 



jr^ = <^ z]\z e R}, 
•^w = {/i : K. ^ H I /i is Lipschitz, \\h'\\ ^ 1}, 
•^BW = {/i : R ^ H I /i is Lipschitz, \\h\\ ^ 1 and \\h'\\ ^ 1}. 

Then, for a general set of test functions J^, the distance between two prob- 
ability measures P and Q with respect to is defined as 



sup 

fey" 



fdP 



fdQ 



(2.1) {1} 



if the corresponding integrals are well-defined. Denote by cZk, dw and 
dew the respective distances induced by the sets Tk, J^w and J-'bw- The 
subscripts respectively denote the Kolmogorov, Wasserstein, and bounded 
Wasserstein distances. We have 



dBW^d^^, dK(^,Exp(l)) ^ 1.74Vdw(-P,Exp(l)). (2.2) {2} 



The first relation is clear as J-'bw C .Fw- We refer to iGibbs and iul (|2002l ) 
for the second relation. 

Our first approach is rel ated to the zero-bias coupling introduced in 
Goldstein and ReinertI ( 1997 ) in the context of normal approx imation. It 
was also (independently of the current work) previously used by Bon ( 20061 . 
Lemma 6) to study geometric convolutio ns. Where a s zero - bias couplings are 
diffi cult to c onstru ct in general (see e.g. I Goldstein! (120051 ). iGoldsteinI (j2007l ) 
andlGhoshI (|2009h ^. it turns out that there is a convenient way to obtain 
this type of coupling in our case; see Section 12.1.11 

Definition 2.1. Let X be a non-negative random variable with finite mean. 
We say that a random variable X"^ has the equilibrium distribution w.r.t. X 
if for all Lipschitz / 



E/(X) - /(O) = EXE/'(X'= 



(2.3) {3} 



We use the term 'equilibrium distribut ion' due to its common use for 
this transformation in renewal theory (see iRoss and Peko j (j2007l )): for a 
stationary renewal process with inter-event times having distribution X, the 
time until the next renewal starting from an arbitrary time point has the 
equilibrium distribution X^ . Though there is some connection, we choose not 
to refer to the distribution of X^ as 'exponential zero-biased' since there are 
ot her Stein operators in t he context of exponential approximation (as used 
by lChatterjee et al.l (|2006h ^ that may lead to distribut ional characterizations 
more similar to the actual zero-bias transformation of [Goldstein and Reinert 
(|l997l ) ;han to p.3p . The connection between this transformation and the 
'size-biased' transformation is discussed in Section [2.1.11 below. 
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For nonnegative X having finite first moment, define the distribution 
function 

F'{^) = ^ jy[X > y\dy (2.4) {4} 

on X ^ and F^[x) = for x < 0. Then 

E/(X)-/(0) = E r f'{s)ds 
Jo 

/"CO /"OO 

= E / f'{s)I[X > s]ds = / f{s)F[X > s]ds, 
Jo Jo 

so that is the distribution function of X'^ and our definition via (j2.3p is 
consistent with that from renewal theory. It is clear from (j2.4|) that al- 
ways exists if X has finite expectation and that F^ is absolutely continuous 
with respect to the Lebesgue measure. It is also clear from (|2.3|) that con- 
ditions on the moments of ^{X'^) go along with corresponding conditions 
on the moments of ^{X): finite m-th moment of X^, m > 0, requires finite 
(m -|- l)-th moment of X. 

For a random variable W it can be seen that ^(W) = ^(W^) if and 
only if W has an exponential distribution; when W is exponential it is clear 
that ^(W) = J^iyV^) and we show the converse below. Our first result here 
can be thought of as formalizing the notion that when ^(W) and ^{W^) 
approximately equal then W has approximately an exponential distribution. 
We not only give bounds for W ^ but also for as this quantity itself may 
be of interest; see Section [3.21 for such an example. 

Theorem 2.1. Let W he a non-negative random variable with EVF = 1 and 
let have the equilibrium distribution w.r.t. W. Then, for any (3 > 0, 



ables (VF, W , G); this approach was recently introduced bv lChen and Rollin 



Here, W is the random variable of interest, W' is a 'small pertur- 



bation' of W, and G is an auxiliary random variable which in some sense 
brings the coupling into the domain of the Stein operator that is used for 
the approximation. 



{thml} 



dK (^(ly), Exp(l)) ^ 12(3 + 2F[|VF^ -W\> /3] (2.5) {5} 

and 

dK(^(W^'),Exp(l)) ^/3 + F[|I^" -1^1 > /?]. (2.6) {6} 

// in addition W has finite second moment, then 

dw(^(VF),Exp(l)) ^ 2E|W/" - W\ (2.7) {7} 

and 

dK{^{W^),'Exp{l)) ^-EIW^ -W\; (2.8) {8} 

bound ([231) also holds for dw{^{W),Exp{l)) . 

The key to our second approach is a coupling of th ree random vari- 
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Definition 2.2. A coupling (W, W , G) is called a constant Stein coupling if 

]S{Gf{W')-Gf{W)} = ]Ef{W) (2.9) {9} 

for all / with /(O) = and for which the expectations exist. 

T his coritrasts with the linear Stein couvlina introduced in Chen and 



Rolhn torn , where the right hand side of ([23]) is replaced by E,{Wf{W)} 
and is used for normal approximation. We emphasize that the equality in 
()2.9p need not be exactly satisfied for our next theorem to be applied. But 
in order to obtain useful results, (j2.9p should be approximately true. To this 
end we define 

ri(.F)= sup \]E{Gf{W')-GfiW)-f{W)}\. 

/(0)=0 

This term measures the extent to which ()2.9p holds with respect to the class 
of functions from with /(O) = 0. Another primary error term that will 
appear in the error bound is 

ra = E|e'^"(GL') -l|, 

where here and in the rest of the article D := W — W. The random 
variable W" is defined on the same probability space as (W, W , G) and can 
be used to simplify the bounds (it is typically chosen so that r2 = 0); let 
D' := W" — W.Kt first readin g one may simply s et W " = W (in which case 
typically r2 7^ 0); we refer to Chen and Rollin ( 20091 ) for a more detailed 



discussion of Stein couplings. 

Theorem 2.2. Let W , W' , W" and G he random variables with finite first 
moments such that also 'Ei\GD\ < 00 and 1E|GD'| < 00. Then with the above 
definitions, 



^ 2ri(J-Bw) + 2r2 + 2r5 + 2r^ + 22{ap + 1)/?' + 12aP^, 

where 

rg = ]S\GDl[\G\ >a or \D\ > 

r'^ = 'E\{l-GD)l[\G\ >a or\D\>p or\D'\ > /3']\. 



{thm2} 



(iw(i?(T^),Exp(l)) ^ri(.Fw)+r2 + 2r3 + 2r^ + 2r4 + 2r^, (2.10) {10} 
where 

ra = nGDl[\D\ > 1]|, = E|(GZ) - 1) 1[\D'\ > 1]|, 

r4 = ]E\G{D^ A 1)1, = mGD - 1){\D'\ A 1)|. 

The same bound holds for dew with ri(^w) replaced by ri(^Bw)- Further- 
more, for any a, (3 and 13' , 



(2.11) {11} 
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2.1 Couplings 

In this section we present a way to construct the distribution more 
explicitly and also a few constant Stein couplings. We note that the main 
goal of the coupling {W, W , G) is to achieve (12 .Op . so that Theorem 12.21 can 
be applied with ri being small. We will not discuss how to obtain W" as 
this depends very much on the concrete application; see Section 13.11 for an 
example. 

2.1.1. Equilibrium distribution via size biasing. Assume that = 1 and 
let have the size bias distribution of W , i.e. 

M{Wf{W)} = Mf{W') 

for all / for which the expectation exist. Then, if U has the uniform dis- 
tribution on [0, 1] independent of all else, := UW^ has the equilibrium 
distribution w.r.t. W. Indeed, for any Lipschitz / with /(O) = we have 

lSf{W) = lSf{W) - /(O) = ]S{Wf'{UW)} = ISf'iUW) = ISf'iW^). 



{sec2} 



We note that this con struction was also considere d by IGoldsteinI JiooJ. It 
has been observed by iPakes and Khattred (|l992l ) that for a non-negative 
random variable W with < oo, we have that J!?{W) = S£(\JW^) if and 
only if W has exponential distribution. 

It is clear here that we do not intend to couple (VF, ly) as closely as 
possible, but (IF^, C/M^*), so that, unfortunately, we cannot utilize the large 
literatu r e available for size-bias couplin g, such as lBarbour. Hoist, and Janson 
(|l992bl ). iGoldstein and Penrosel (120081 ) and others. 



2.1.2. Exchangeable pairs. 
that 



Let {W, W) be an exchangeable pair. Assume 



'Er{W'-W) = -\ + \R on{W>0}. 

Then, if we set G = {W'-W)/{2\), we have ri(j^Bw) ^ ^\R\ and ri(J%) ^ 
^RW\. 

This coupling was used by IChatteriee et al.l (l2006h to obtain an expo- 
nential approximation for the spectrum of the Bernoulli-Laplace Markov 
chain. In order to obtain optimal rates, Chatteriee et al. ( 20061 ) develop 
more application specific theorems than ours. 

2.1.3. Conditional distribution of W given E^. Let E be an event and let 
p = F[i?], where p is thought to be small. Assume that W' and Y are 
defined on the same probability space and that ^{W) = .^iW\E'^) and 
.if(y) = .^{W\E). Then, for any Lipschitz / with /(O) = 0, and with 



{sec3} 



{sec4} 



{sec5} 
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G = {l-p)/p, 



^{Gf{W')-Gf{W)] 

^fiW) - -E/(VF) + E/(VF) 
P 

^f(W') - ^^^{f{W)\E^) - nf{W)\E) + E/(iy) 
P 

^f{W') - i^E/(VF') - E/(y) + E/(VF) 
P 

^f{W) - ^{Yf{UY)}, 



1 


-p 




p 


1 


-p 




p 


1 


-p 


p 



so that r'i(J^w) ^ WY . This couphng is mainly used by Pekoz ( 19961 ) for 



geometric approximation in total variation. The Stein operator used there 
is a discrete version of the Stein operator used in this article. Clearly, one 
will typically aim for an event E D {W = 0} in order to have y = 0. 

2.1.4- Conditional distribution of W' given E^. The roles of W and W 
from the previous coupling can be reversed. Let E and p be as before. 
However, assume now that if(T^) = and ^(F) = Se{W'\E). 

Then, for any Lipschitz / with /(O) = 0, and with G = — 1/p 

^{Gf{W')-Gf{W)] 

= -W{w) - -^f{w') 
p p 

= -E/(VF) - E/(y) - 1^E/(VF) 
P P 

= w{w) - E{y/'(c/y)}, 

so that ri(J%) ^ EY. 

3 APPLICATIONS 

3.1 Random sums 

It is well-known that a geometric distribution divided by its mean, when the 
mean is large, is close to the standard exponential distribution. If ~ Ge(p) 
(starting at 1) then we can write N = YlZi 1 and one may ask whether 
the I's in this sum can be replaced by random variables Xi,X2, ■ ■ ■ having 
mean of order 1 , so th at we still obtain an exponential limit. A classical 



result by iRenyil (119571 ) states that this is true if the sequence consists of 
i.i.d. random variables under the surpr isingly mild condition that has 
finite first moment (c.f. Sugakoval ( 199d )). A more general result by Brown 



( 199d . pp. 1400/1) states that "^f^iXi/N — > 1 almost surely is a sufficient 



condition for an exponential limit — even when the variables are not i.i.d. or 
not independent of N. 



{sec6} 



{sec7} 
{sec8} 
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Quite a few (mostly Soviet) author s have considered th e uniform rate of 
convergence in the 80s and early 90s; iKalashnikovl (|l997l ) is the standard 
reference here. Although convergence results for the non i.i.d. case have 
been considered in previous articles, to the best of our knowledg e the only 
article that goes beyond the above i.i.d. case for uniform bounds is ISugakova 
who considers independent, non-identically distributed random vari- 
ables with equal mean, again under the assumption that N has a geometric 
distribution. Our aim is to relax the assumptions of independence of the Xi, 
equal mean, and the geometric distribution of A^. Let us first handle the 
case of some martingale-like dependence structure using the equilibrium dis- 
tribution approach. For a random variable X denote by Fx its distribution 
function and by F^^ its generalized inverse. We adopt the standard con- 
vention that Yl^a = if 6 < a. 

Theorem 3.1. Let X = {Xi, X2, ■ ■ ■) be a sequence of square integrable, 
non-negative random variables, independent of all else, such that, for all 
i ^ 1, 

lE(Xj|Xi, . . . , A"j„i) = Hi < 00 almost surely. (3-1) 

Let N be a positive and integer valued random variable with ISN < 00 and 
let M be another random variable with distribution defined by 



P{M = m)= i-imP{N > m)/i^t, 



m = 1,2, . . . 



with 



N 



H = E = l^mP{N ^ m). 



i=l 



where each Xf is a random variables having the equilibrium distribution 



w. r. t. Xi given Xi , 
K, then 



If, in addition, Xi ^ C for alii and \N — M\ ^ 



m—l 



i=l 



E 



1=1 



m—l 



1=1 



{thm3} 



{12} 



(3.2) {13} 



Then, with W = fi ^ J2iLi ^i; we have 

dw(^(^^),Exp(l)) ^2^-i(E|XAf-Xl,|+sup/ii E|iV-M|), (3 3) |i4| 



dK(^(VF),Exp(l)) ^ 12h-^\suv\\FZ^ - FzI\\+Ck\- (3.4) {15} 

Vi^i ' ' J 

if K = 0, the same bound also holds for unbounded Xi. 

Proof. First, let us prove that := (Sf=r^ -^i + -^m) equilib- 
rium distribution w.r.t. W. Note first from (j2.3|) . (|3.ip and the assumptions 
on Xf that, for Lipschitz / and every m, 
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Note also that, for any Lipschitz function g with g{0) = we have 
-g{M) g{M-iy ^ 

m>0 



IJ'M 



P{N ^ m){g{m) - g{m - 1)) = ]Eg{N). 



We may now assume that /(O) = 0. Hence, using the above two facts and 
independence between M and the sequence Xi, X2, ■ ■ ■ , we have 



^ i=i ' 

P . M . M-l . 

- ^ i=l ^ ^ i=\ ^ 

^ i=i ' 



Now, 



M 



Xm) + sgn(M -N) Yl 

i={MAN)+l 



(3.5) {16} 



Due to Strassen's Theorem, we can always couple two random variables X 
and Y in such a way that |X — y| ^ ll-^x^ — -^y"*^!!- Hence, choosing 



/3 = /x 



-1 



|sup||F^i-F^l||+C7i^|, 



we have F [|VF^ - > /3] = 0, and thus, from (l23|), ([331) follows; the re- 
mark after p.4p follows similarly. Using p.5p we can also easily deduce ()3.3p 
from (1231). □ 



Note that the idea of replacing a single summand by a distributional 
transform is omnipresent in t he literature related to Stein's method; see e.g. 
Goldstein and Reinert ( 19971 ) in connection with the zero-bias distribution 



for normal approximation. 

Remark 3.1. Let N ~ Ge{p) (starting at 1) and assume that the fii are 
bounded from above and bounded away from 0. This implies in particular 
that l/jU X p as p — » 0. Now, frorn the Kantorovic - Rubi nstein Theorem (see 
Kantorovic and Rubinste&J ( 19581 ) and IVallender ( 19731 )) we have that 



{reml} 



dw(^iN),^iM)) = inf E|A^-M| 

(N,M) 



(3.6) {17} 



where the infimum ranges over all possible couplings of N and M. As ()3.3p 
holds for any coupling (N, M) we can replace E|A^ — Af | in ()3.3p by the left 
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hand side of ()3.6p . To bound this quantity note first that from (I3.2p we 
deduce 

m(M) = ^h(N)} (3.7) {18} 

for every function h for which the expectations exist. Note also that 

oo oo 

E(/i7v) = Ej];/ifeF[iV = A;] =Ej^^fcpF[A^^ A;] =^p. (3.8) {19} 

k=l k=l 

Let h now be Lipschitz with Lipschitz constant 1 and assume without loss 
of generahty that h{0) = 0, so that \h{N)\ ^ N. Then 



|E{/i(M) - h{N)}\ = |e{ - l) /i(iV)} 



^ E 



^ VVar(^jv)EiV2 ^ v^2Var(/.jv) 



/Lip ;Up2 

Hence, under the assumptions of this remark, for the second term in ()3.3p to 
converge to zero it is sufficient that Var(//7v) ^ as p ^ 0. The following 
remark gives a sufficient condition for this. 

Remark 3.2. We say that a sequence qq, ai, 02, . . . is Abel-summable with 
limit A if for all |x| < 1 the sum X^^,>o ^k^^ exists and if 



a^x^ — > A as 2; — > 1. 



Now, if ~ Ge{p) and Sfc = ao + ai + • ■ • + Ofc the partial sums, we have 
E(s,v-i) =pY1 Sk{l - pf = - Pf- 

Hence, we have Var(/iAr) = E(/i^) — (E/iAr)^ ^ as p ^ if and only 
if the sequences /if , /^^ ~ A'i ~ • • • /^i) /"2 — fJ-i, fJ-s — ^^2, ■ ■ ■ are 
Abel-summable with limits and A respectively. This is certainly the case 
'd jik ^ A as k — > CX3, because A bel-summability is consistent with regular 
summability; see iKorevaaii ()2004l . p. 4). 

Next is a corollary, and first we need a definition. 

Definition 3.3. A nonnegative random variable X with finite mean is said 
to be NBUE (new better than used in expectation) or NWUE (new worse 
than used in expectation) if we respectively have either E[X — s\X > s] ^ 
E[X] for all s > or E[X - s\X > s] ^ E[X] for all s > 0. If X is nonnega- 
tive integer-valued then we say it is discrete NBUE (discrete NWUE) if we 
E[X - s\X > s]^ for all s = 1, 2, . . .. 



{rem2} 



{20} 
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It can be shown (see lShaked and Shanthikumad (j2007l . Theorem 1.A.31)) 
that if X is NBUE then X and if X is NWUE then X^'^stX. UN 

is in teger valued and discrete NBUE (discret e NWU E) then it is also true 
(see ISengupta. Chatteriee. and Chakrabortvl (jl995l . p. 1477)) that for the 
corresponding M in ()3.2p with 1 = fii = H2 = ■ • • we have M {^st)X. 
In this case we can couple X and X^ so either X^ ^ X oi X^ ^ X holds 
(and N and M so either N ^ Ad or N ^ Ad holds) and we then have 



and also 



MX'' -X\ 



MM -N\ 



HEX" - EXI 



lEM - EiVl 



2EX 



EX 



EiV^ I 

+ - - EiV 

2EiV 2 



(3.9) {21} 



(3.10) {22} 



and thus we immediately obtain the following corollary. 



Corollary 3.2. Consider the situation from Theorem and assume in 
addition that the Xi are independent with EXj = 1 and, for each i, we have 
that Xi is either NBUE or NWUE. If N is integer-valued and either discrete 
NBUE or discrete NWUE then 



dw(^(W^),Exp(l)) ^ 2^i~isup|iEX2 - l| +2 



EX2 1 
2/^2 ^ 2JI 



1 



(3.11) {23} 



Example 3.4 (Geometric convolution of i.i.d. random variables). Assume 
that N ~ Ge(p) and that EXi = 1. Then, it is straightforward that 
.if(M) = .if(X), hence we can set M = N. Denote by 5{J-) the distance 
between ^(Xi) and Exp(l) as defined in ()2.1I) with respect to the set of 
test functions !F; define 5"{T) analogously but between »2'(X) and ^(X^). 
In this case the estimates of Theorem 13.11 reduce to 



dw(i?W,Exp(l)) 2p5'^(^w), 
(iK(^(W^),Exp(l)) ^ l2p\\F^l - F^l 



(3.12) {25} 

(3.13) {26} 



Inequality ()3.12p follows again from the Kantorovich-Rubinstein Theorem. 
Note that these two bounds are small not only if p is small but also if the 
Xi are close to exponential. 

Let us compare this with analogous results from the literature. From 
Kalashnikovl ()1997l . Theorem 3.1 for s = 2, page 151) we have the (slightly 
simplified) bound 



where 



(iw(^(VF),Exp(l)) ^ p5[T^) + 2p5{J^2] 
T2 = {f^C\^) I /'g.Fw}. 



(3.14) {26b} 
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Known results from the literature typically invole a direct measure of close- 
ness between Xi and Exp(l), whereas our results (|3.12p - (j3.13p incorporate 
a measure of closeness bewteen (Xi) to its equilibrium distribution. By 
means of (j2.7p and again the Kantorovich- Rubinstein Theorem we have the 
relation 

Let Z ~ Exp(l) and let hhe a differentiable function with h{0) = 0. Then, 
recahing that ^(Z^) = ^(Z), we have from ([23]) that E/i(Z) = E/i'(Z) 
and, using again (j2.3p for X and X'^, 

m{x) - m{z) = m'ix'') - m'{z). (3.15) {26d} 

This implies 

<5(^2) =dw(^(^f),Exp(l)) (3.16) {26e} 

and hence, from (|2.8p . we have 6{J^2) ^ ^^{^w)-, so that (j3.14p gives a bound 
which is not as good as our ()3.12p if the bound is to be expressed in terms 
of (5^(^w)- 

On the other hand, from ()3.16p and the triangle inequality, 

5'{T^) ^ (^(.TV) + dw(^(^f),Exp(l)) = 5{Ty^) + 5{T2). 

Hence, our bound is not as good as (|3.14p if the bound is to be expressed 
in terms of closeness of ^{Xi) to Exp(l). It seems therefore that, although 
much broader in applicability, our theorems are able to yield comparable 
results for the case of geometric convolutions, at least with respect to the 
Wasserstein metric. 

In the case where J^{Xi) has a density, we have from lKalashnikovl ([1997, 
Theorem 4.1 for s = 2, page 152) the (again slightly simplified) bound 

dK(^(W^),Exp(l)) ^P,5(.Fk) + (1 + p)p<5(.Fw)+p(1A2p)5(7-2), (3.17) {26c} 

where p is the supremum of the dens ity of ^{W). Then, according to 



Kalashnikov and Vsekhsvvat-ski]l (jl989l . p. 99), p can be bounded by 



J^(.Fw) + 2supjFj,^(x)-(l-Fx,(x))|) 
(2-5-(.Fw)) 

where Fx^ is the distribution functi on oi^iX^ ) with derivative F'^ . Kalash- 
nikov and Vsekhsvyat-skii (|1989 ) also obtain a more complicated bound for 
general, non-continuous Xi; we refer to their paper for that result. It seems 
difficult to directly compare this bound with our result (j3.13p as there is no 
obvious correspondence between the right hand side quantity in (I3.13P and 
the quantities appearing in ()3.17p . 
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If for each i the variable Xi is either NBUE or NWUE, (I3.12p in combi- 
nation with (j2.2p gives us 



{thin4} 



1 /2 

dK(^(VF),Exp(l)) ^2.47fpsup|iEX2-l|) . (3.I8) {27} 

Although (j3.18|) is non-optimal in rate, it is purely formulated in terms 
of moments of the involved random varia bles. Results si r nilar to ()3.18p 
have appeared elsewhere: Theorem 6.1 i n iBrowii and Gd (119841) has the 
same result with a larger constant, but Brownl ( 19901 ) and Daley ( 19881 ) 
subsequently derived significant improvements of this result. 

Let us look now at a more flexible, but less precise approach. In partic- 
ular, we do not assume a martingale-like dependence like (|3.ip . This comes 
at the cost of a non-vanishing bound in the case where the summands are 
i.i.d. exponential and the number of summands is geometrically distributed. 
Note that, in the context of this approach, it is more convenient to think 
of as starting from 0. If it starts at 1, then the modifications are mi- 
nor, in particular, one needs to make use of the random variable Y from 
Section 12.1.31 

Theorem 3.3. Let X = {Xi, X2, ■ ■ ■) be a sequence of random variables 
with EXj = fii and ISXf < 00. Let N, N' and N" be non-negative, square 
integrable, integer valued random variables independent of the sequence X. 
Assume that 

p := F[Ar = 0] > 0, ^(A') = ^(iV|iV > 0), A" A ^ A'. 

Define S{k, I) := X^+i H \- Xi for k < I and X{k, I) = for k I. Let 

// = E5(0,A) andW = S{0,N)/n. Then 

dw(jSf(VF),Exp(l)) 

qs_ 4gE{S(A^, N'){1 + S{N", N))} 4ES(A^^ A) 

Pfl pfj,"^ jJL 

where = Var E(5(A, A')|.F/v") and Tk '■= c(Ai, . . . , X^)- If, in addition, 

Xi^C, A'-As^A'i, A-A"^A'2, (3.19) {29} 

for positive constants C, Ki and K2, then 

PH f-l P/J,'' 

Proof. We make use of the coupling construction from Section 12.1.31 Let 
^ = {A = 0}, let y = 0, let W = fi'^J^Zi^i and likewise W" = 
fi"^ Si^i -^i- Then the conditions of Section 12.1.31 are satisfied with G = 
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(1 —p)/p and we can apply Theorem 12.21 in particular (12. lip . We have 
n(-^Bw) = as proved in Section r2.1.31 Note now that D = S{N,N') and 
D' = S{N",N). Hence rs = E|l - qipny^lSiSiN, N')\Tn")\. As ^ 
implies that E(GL') = EVF = 1 the variance bound of r2 follows. The 
dw-bound follows from (I2.10p . using the rough estimates r^ + r^ ^ lE(GD^) 
and Tg + r4 ^ 21ED' + 2Ei{GDD') as we assume bounded second moments. 
To obtain the dx-bound choose a = G = {I — p)/p, (3 = CKi/jj. and 
P' = CK2/H; then rg = = 0. Hence (f2TT]l yields 

dK(^(H^), Exp(l)) ^ ra + 22(a/3 + 1)/?' + Uafi"^ . 

Plugging in the value for r2 and the constants, the theorem is proved. □ 

Example 3.5 (Geometric convolution under local dependence). If + 
1 ~ Ge{p) (that is, is a geometric distribution starting at 0) we can 
choose iV' = iV + 1, as ^{N\N > 0) = ^{N + 1) due to the well- 
known lack-of- memory property; hence Ki = 1. Assume now there is 
a non-negative integer m such that, for each i, (Xi,...,Xj) is indepen- 
dent of (Xj_|_m+i , 5 • • • ) • We can set A^" = max(A^ — m,0), hence 
^ Var/XTv+i, where := EXj. Assume also that /ij ^ ^0 for some 
//Q > 0, so that /U ^ /Uo/p- Hence Theorem 13.31 vields 

dK{^(»').Exp{l)) < v;M^^ + ^ 2C^p(lln, + 6) 

/^o A^o A*o 

Hence, convergence is obtained only if Var(/iAr_|_i) ^ as p ^ 0; c.f. Re- 
marks 13.11 and 13.21 



3.2 First passage times 

It is well-known that the time until the occurrence of a rare event c an often 
be well approximated by an exponential distribution; Aldous ( 19891 ) gives a 
wi de survev of the settings where this phenomenon occurs, and Aldous and 
Fill (jPreprintl ) summarize s many results in the setti ng o f Markov chain hit- 
ting ti mes. The articles by Aldous and Brown ( 1992i ) and Aldous and Brown 
(|l993l ) are other good entry points to the large literature on approximately 
exponential passage times in Markov chains. 

Let Xq, Xi, ... be an ergodic and stationary Markov chain taking values 
in a denumerable space X with transition probability matrix P = {Pij)ij^x 
and stationary distribution vr = {'Ki)i<^x and let 

r^,i = inf{t ^ : = 

be the time of the first visit to state i when started at time according to 
the stationary distribution vr and let 



{sec432} 



inf{t > : = j} starting with Xq 
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be the first time tlie Markov chain started in state i next visits state j. 
The Markov chains in the two definitions may be separate copies coupled 
together on the same probabihty space. 

{corr} 

Corollary 3.4. With the above definitions we have 

dK(^(vrir^,i),Exp(l)) ^ 1.57r, +7r,E|r^,, -Ti.il (3.21) {32} 

and 

dK(^(7rir^,i),Exp(l)) ^ 27ri + F(T^,, ^ H^i) (3.22) {33} 

Proof. We first claim that, with U a uniform [0, 1] random variable inde- 
pendent of all else, ^(T?-) = ^(Ttt^j + U). To see this, first note that 

F(r^,i = k) = 7riF{Ti,i > k) 

follows because visits to state i constitute a renewal process and a cycle 
(having mean length ETj^j = l/vTj) will have precisely one time t when the 
excess 

Yt = inf{s ^ t : Xs = i} - t 

is exactly equal to k if and only if the cycl e length is great e r tha n k. We 
then apply the renewal reward theorem; see iRoss and Pekoa (|2007l ) . 
So with /(O) = and using ()2.3p we have 

E/'(r^,. + U)= lSf{T^,i + 1) - /(T,,,) 

= 7ri^FiTi,i>k)ifik + l)-f{k)) 

k 

= E E ^(^v = J)ifik + 1) - f{k)) 

k j>k 

= ^*E E nT^,^=J){f{k+l)-m) 

j 0^k<j 

= 7r,5^F(r,,, = i)/(j) 
j 

and the claim follows from (|2.3|) . We then have 

dK(if(vrir,,i),Exp(l)) ^ vTi + dK(if(^i(r,,» + C/),Exp(l)) 

= 7T, + dK{^{7TiTli),Exp{l)) (3.23) {34} 

where we use dK{^iT.,^,i),^(TT^,i + U)) ^ tt^ in the first line and ^(T/J = 
.if (r^_j + U) in the second line. We obtain inequality (j3.2ip from (j3.23|) and 
()2.8|) and then using 

E|r^,i + u -Ti,i\^-EU + E|r^,i - Ti^ii < 0.5 + E|r^,i - ri,i| 

and we obtain (I3.22p from ()3.23p and ()2.6p using (3 = i^i (since {[Tt^^i + U — 
Ti^i\ > 1} implies {T^^j / Ti,j})- □ 



15 



We say that a stopping time Ti^j^ is a stationary time starting from state 
i if ^{Xxi ^ I ^0 = = vr. Also, whenever Tj^j and Tj^^ are used together in 
an expression, it is understood that they are both based on the same copy 
of the Markov chain started from state i. 

{cor} 

Corollary 3.5. With the above definitions and p = P[Tj^j < Ti^^^], 

(iK(^(vrir^,i),Exp(l)) ^ ^i(l.5 + ETi,^ + psup^- ET,-i) (3.24) {35} 

and 

oo 

dK(^(vr,r,,i),Exp(l)) ^ 27r, + ^ \pff - vr^l (3.25) {36} 

n=l 

Proof. Letting Xq = i, 

T^^i = inf{t ^ : XT,,,+f = i} 

and 

Ti^i = inf{t >0:Xt = i} 
and A = {Tj^j < Tj.Tr} we have 

and ([3:2i|) follows from ([3:2T|) after noting E[r^,i| ^] ^ sup^ ET^-j. 

For ()3.25p . let Xi, X2, ... be the stationary Markov chain and let Iq) ^ij ■ ■ ■ 
be a coupled copy of the Markov chain started in state i at time 0, but let 
Yj , I2 ; ■ ■ ■ be couple d with Xi , X2 , . . . according to the maximal coupling of 
Griffeath (|l974/75l ) so that we have V{Xn = = i) = m A P^f. Let T^^i 



and Ti^i be hitting times respectively defined on these two Markov chains. 
Then 

n 

and since 



F(X„ = y„ / i) = - F(X„ = i, K„ = i) 



and a similar calculation yields 

and then we obtain the result from (I3.22p . □ 
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Example 3.6. With the above definitions and further assuming is an 
m-dependent Markov chain, we can let Tj^^ = m and we thus have 

o!K(if(vrir^,i),Exp(l)) ^ 7ri(m + 1.5 + F(ri,i < m)supET,-i) 



and 



m—l 

E 

n=l 



(iK(^(^iT^,*),Exp(l)) ^ 2TTi + V \P^f - TTi 



If we consider flipping a biased coin repeatedly, let T be the number of flips 
required until the beginning of a given pattern of heads and tails of length 
k first appearing as a run. The current run of k flips can be encoded in the 
state space of a A;-dependent Markov chain. Suppose the Markov chain at 
time n is in state i if some given desired pattern that can not overlap with 
itself appears as a run starting with flip n. This means F(rj^j < m) = and 

(n) 

Pj^- = 0,n < m. By applying the second result above we then obtain 

dK(if(vr,r^,»),Exp(l)) ^ TTiik + l) 

If we are instead interested in the time until a run of k heads first appears, we 
can use the "de-clumping" trick of waiting for the (non-overlapping) pattern 
of tails followed by k heads in row and notice that this differs from the first 
appearance time of k heads in a row only if the first k flips are heads. If we 
let T be the number of flips required prior to the start of a run of k heads 
in row, we have 

dK(^(g/7;,^),Exp(l)) ^ (A; + 2)/ 

where p = 1 — g is the probability of heads. This result is nearly a factor of 2 
improvement over the result from lBarbour et al.l (|l992bl . Page 164), where a 
Poisson approximation is used for the number of times the pattern appears 
to estimate the error bound on the exponential tail probability. Thi s type 
of bou nd also appears in the context of geometric approximation in iPekoz 
(Il996l ). 



Our next result s (and improve n ients o f them) have previo usly appeare d 
in the literature in iBrown and Gel (|l984l ). iBrownl (jl99d ). and lPalevI (119881 ). 
Recall the definitions of NBUE and NWUE from Definition O The follow- 
ing is an immediate consequence of Theorem 12.11 (13. 9p and ()2.2p . 



Corollary 3.6. IfW is either NBUE or NWUE with WA^ 
moment and letting 

p = l^isw^ - l| 

we have 



1, finite second 



dw(^(W^),Exp(l)) ^ 2p, dK(^(VF),Exp(l)) ^ 2.47p^/2^ (3.26) 
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and 



dw(^(H^'),Exp(l)) ^p, dK{^{W'),Expil)) ^ p. (3.27) {38} 



Remark 3.7. The class of stochastic processes m apphcations where first 
passag e times are either NBUE or NWUE is quite large: see Karasu and 
Ozekici (jlQSflh or iLaml (| 199(1 ) for a survey and some examples. The class 
of mixtures of exponential distributions, also called completely monotone 
distributions, app ears prominently in the se tting of reversible Markov chain 
hitting times; see lAldous and Filll (jPreprintI ) . It is straightforward to verify 
that the class of completely monotone distributions is a subset of the class 
of NWUE distributions. 



3.3 Critical Galton- Watson branching process 

Let Zq = 1, Zi, Z2, . . . be a Galton- Watson br anching proces s with offspring 
distribution u = ^(Z\). A theorem due to Yaglom ( 194?! ) states that, if 



EZi = 1 and VarZi = cj^ < 00, then ^{n~^Zn\Zn > 0) converges to an 
exponential distribution with mean c7^/2. We give a rate of convergence for 
this asymptotic under finite third moment of the offspring distribution using 
the idea from Section 12.1.11 Though expone ntial limits in this context a re 
an active area of research (see, for example, Lallev and Zheng ( in pressi )). 



the question of rates does not appear to have been previously studied in 
the literature. To t his end, we make use the of construction from Lvons. 
Pemantle, and Peres ( 19951 ): we refer to that article for more details on the 



construction and only present what is needed for our purpose. 

Theorem 3.7. For a critical Galton-Watson branching process with off- 
spring distribution v = ^{Zi) such that ISZf < 00 we have 



dw{^{2Zn/ia\)\Zn > 0),Exp(l)) = O 



log n 



n 



Proof . First we construct a size-biased branching tree as in iLyons et al 



(|l995h . We assume that this tree is labeled and ordered, in the sense that. 



if w and v are vertices in the tree from the same generation and w is to the 
left of V, then the offspring of w is to the left of the offspring of too. Start 
in generation with one vertex vq and let it have a number of offspring 
distributed according to the size-bias distribution of v. Pick one of the 
offspring of vq uniformly at random and call it vi . To each of the siblings of 
vi attach an independent Galton-Watson branching process with offspring 
distribution u. For vi proceed as for vq, i.e., give it a size-biased number 
of offspring, pick one at uniformly at random, call it V2, attach independent 
Galton-Watson branching process to the siblings of V2 and so on. It is clear 
that this will always give an infinite tree as the "spine" vq,vi,V2, ■ ■ ■ of the 
tree will never die out. 



{sec9} 



{thmS} 
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Let us have some notation now. Denote by Sn the total number of 
particles in generation n. Denote by and Rn, respectively, the number of 
particles to the left (exclusive Vn) and to the right (inclusive Vn), respectively, 
of vertex Vn, so that 5„ = L„ + Rn- We can describe these particles in more 
detail, according to the generation at which they split off from the spine. 
Denote by Snj the number of particles in generation n that stem from any of 
the siblings of Vj (but not Vj itself). Clearly, 5„ = 1 + X]j=i '^nj, where the 
summands are independent. Likewise, let Lnj and Rnj, respectively, be the 
number of particles in generation n that stem from the siblings to the left 
and right, respectively, of vj (note that Ln,„, and Rn.n are just the number 
of siblings of Vn to the left and to the right, respectively). We have the 
relations L„ = Yl^=i ^n,j and Rn = 1 + Z]j=i ^nj- Note that, for fixed j, 
Ln,j and Rnj are in general not independent, as they are linked through the 
offspring size of Vj-i. 

Let now R'^ ■ be independent random variables such that 

-^(-^rij) = ■^{Rn,j\Ln,j = 0). 

and, with Anj = {Ln,j = 0}, define 

= Rn,jlA„^j + R'nJ^A''^ . = Rn,j + {R'nj " ^n,j)lA^^ j- (3.28) 

Define also i?* = 1 + Yl^=i j- collect a few facts which we will 

then use to give the proof of the theorem: 

(i) for any non-negative random variable X the size-biased distribution 
of -Sf (X) is the same as the size-biased distribution of ^{X\X > 0); 
[a) Sn has the size-biased distribution of Zn, 

(in) given Sn, the vertex Vn is uniformly distributed among the particles 
of the nth generation; 

(iv) ^{Rl)=^{Zn\Zn>Oy, 

(vi) E,{Rn,jlA-^j} ^ 7lP[^n,i]> where 7 = EZf; 

(vii) F[A^j] ^ a^F[Zn^j > 0] ^ C{u)/{n- j + 1) for some absolute 
constant C{u). 



Statement {i) is e asy to verify, iii) fo llows from lLyons et al.l ( 19951 . Eq. (2.2)), 



{Hi) follows f rom iLvons et al. I (|1995l . comment after (2.2)), {iv) follows from 



Lyons et al.l ( 19951 . Proof of Theorem C(ii)). Using independence. 



where the second inequality is due to iLyons et al.l ()1995l . Proof of Theorem 



C(ii)), which proves {v). If Xj denotes the number of siblings of Vj, having 
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the size bias distribution of Z\ minus 1, we have 

k 

i^^k¥[Xj = k]P[Al^j\Xj = k\ 



^Y.k^¥[X, = k]¥[A'i,J^^¥[Al^ 

k 



hence (vi). Finally, 

F[Alj] = lE{F[AljXj]} ^ ]E{XjF[Zn-j > 0]} ^ a''F[Zn-j > 0]. 



Using Kolmogorov's estimate (see iLyons et all (Il995l . Theorem C(i))) we 
have 

lim nF[Zn > 0] = 

n— »oo 

which implies (vii). 

We are now in the position to proof the theorem using (j2.5|) of Theo- 
rem [2TJ Let c = 2/c7^. Due to (if) we can set W = cR^/n. Due to (z) 
and {ii), 5„ has the size bias distribution of i?*. Let U be an independent 
and uniform random variable on [0, 1]. Now, i?„ — [/ is a continuous random 
variable taking values on [0, Sn] and, due to {Hi)-, has distribution ^{USn)', 
hence we can set = c{Rn — U)/n. It remains to bound 1E|VF — W^l- 
From ()3.28p and using {y)-{vii) we have 

n 

n 2 I 

1 + C{v) " 1, <^ 1 + C{u){a^ + 7)(1 + logn). 
~^ n — J + 1 

Hence, for a possibly different constant C{v), 

E|M^-H^1^^M]^. 

n 

Plugging this into ()2.7p yields the final bound. □ 

Remark 3.8. Note that, from ()2.2p . Theorem 13.71 implies a rate of con- 
vergence of 0((log(n)/n)"'^/^) for the Kolmogorov metric. However, if the 
number of offspring has a geometric distribution ( started at 0) with mean 1 
then standard generating function arguments (e.g. lAthreya and Neyl ()1972l )) 
show that the number of particles at time n conditioned on non-extinction 
has geometric distribution Ge(l/(n-|- 1)) (started at 1). Hence, in this case, 
the actual order of convergence is . We conjecture that this is also the 
correct rate of convergence for more general offspring distributions. 
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4 PROOFS OF MAIN RESULTS 
Our main results are based on the Stein operator 

Af{x) = fix) - fix) 



{seclO} 



(4.1) {654} 



previuosly studied (independently of each other and , in th e case of the first 
two, independent of th e present work) bv lWeinbersI toO^ . IBouI tQO(h and 
Chatteriee et alJ (|2006l ^. 

Let us first make a connection between this Stein operator and sur- 
vival analysis. Assume that F is a differentiable distribution function of a 
non-negative random variable X having finite first moment such that the 
derivative F' > 0. Let F = 1 — F he the survival function associated with 
X. Then, for every function f G J^q we have 

/•oo roo 

E/(X) = - / fix)F'ix)dx = / f'ix)Fix)dx 
Jo Jo 

= nhiX)f'iX)}, 

where /i(x) = Fix) / F' ix) is the inverse hazard rate function of F. This 
approach seems to be new a nd co ntrast approaches such as the density 
approach by Stein in iReinertI (|2005h . where /'(xl is held fixed and the coef- 
ficient in front of fix) is chosen accordingly, or Nourdin and Peccati ( 20091 ). 
where xfix) is held fixed and the coefficient in front of fix) is altered. 
Now as is well known, the hazard rate function of the standard exponential 
distribution is 1, so that this approach yields the Stein operator (j4.ip and 
the Stein equation hence becomes 

fiw) - fiw) = hiw) - E/i(Z), It; ^ 0. (4.2) 

Note that the density approach by Stein in iReinertI (|2005l ^ yields the same 
operator but with no apparent probabilistic interpretation. The solution / 
to (|4.2p can be explicitly written as 



ihix) - E/i(Z))e-^'dx 



To apply Stein's method we need to study the properties of the solu- 
tion (^731). Some preliminary re sults can be found in Weinberg (|2005h . iBonI 
(|200fil ). IChatterjee et all ^^) and [Dalyl ^^). We give self-contained 
proofs of the following bounds. 

Lemma 4.1 (Properties of the solution to the Stein equation). Let f he the 
solution to (|4.2|) . If h is hounded we have 



{40} 



(4.3) {41} 



{leml} 



l/ll ^ \\h\ 



\f\\^m 



(4.4) {42} 
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If h is Lipschitz we have 

\fiw)\^il+w)\\h'\\, liriKII/i'll, ||riK2||/i'||. (4.5) {43} 

For any a > and any e > let 

ha,e{x) := [ l[x + s^a]ds. (4.6) {44} 

Jo 

Define fa,e in ()4.3p with respect to ha,e- Define hafl{x) = l[x ^ a] and 
fafl accordingly. Then, for all e > 0, 

ll/a,e|Kl, ||/;.IK1, (4.7) {45} 

\f,^,{w + t)-fa,e{w)\^l, |/;,(Tz; + t)-/;,(zz;)|^l, (4.8) {45b} 

\fXe{w + t)- fXe{w)\ ^ {\t\ A 1) + e"M l[a-e<,w + ui^ a]du- (4.9) {46} 

JtAO 



(jO) and dM]) also hold fore = 0. 

Proof. Write h{w) = h{w) — E/i(Z). Assume now that h is bounded. Then 

\f{w)\i^e'" \h{x)\e-''dx i^\\h\\. 

J w 

Rearranging (|4.2p we have f'{w) = f{w) + h{w), hence 

\f'{w)\^\f{w)\ + \h{w)\^2\\h\\. 

This proves (j4.4|) . Assume now that h is Lipschitz. We can further assume 
without loss of generahty that h{0) = as / will not change under shift; 
hence we may assume that \h{x)\ ^ Thus, 

POO 

\f{w)\ ^ e"" / x||/i'||e~^fix = {l+w)\\h'l 



which is the first bound of ()4.5p . Now, differentiate both sides of ()4.2p to 
obtain 

f"{w)-f'{w)=h'{w), (4.10) {47} 

hence, analogous to (|4.3p . we have 

roo 

f'{w) = -e^ / h'{x)e-''dx. 



The same arguments as before lead to the second and third bound of ()4.5p . 
Let us now look at the properties of fa,e- It is easy to check that 

^,o(x) = (e^-'^Al)-e-^ f'{x)=e^~n[xi^a] (4.11) {47b} 
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is the explicit solution to ()4.10p with respect to ha,o- Now, it is not difficult 
to see that, for e > 0, we can write 



and this fa,£ satisfies ()4.2p . These representations immediately lead to the 
bounds (j4.7p and (j4.8p for e ^ from the explicit formulas (j4.1ip . Let now 
e > 0; observe that, from ()4.10p . 

fix + t)- fix) = if{x + t)- fix)) + ih{x + t)- h{x)). 

Again from (14. lip , we deduce that \fa,e{x + t) — fa,£{x)\ ^ A 1), which 
yields the first part of the bound ()4.9p . For the second part, assume that 
t > and write 



ha,e{x + t)- ha,e{x) 



/ h'^^^{x + s)ds = —£ ^ / l[a — £ ^ X + u a]du. 
Jo ' Jo 

Taking the absolute value this proves the second part of the bound (j4.9p for 
t > 0; a similar argument yields the same bound for t < 0. □ 

The following lemmas are straightforward and hence given without proof. 
Lemma 4.2 (Smoothing lemma). For any e > 

dK{^{W),^{Z)) i^e + Snp\ma,eiW)-ma,s{Z)\, 

a>0 



where ha^e are defined as in Lemma \4-1\ 

Lemma 4.3 (Concentration inequality). For any random variable V , 
F[a^V ^b]i^ {b-a) + 2dK{^{V),Exp{l)). 
For the rest of the article write k = (iK(.^(VF),Exp(l)). 

Proof of Theorem\2Ji Let A := W - W. Define h := I[|A| ^ /?]; note 
that may not have finite first moment. With / as in (j4.2p with respect 
to (14. 6p . the quantity lEfiyV^) is well defined as ||/'|| < oo, and we have 

nfiw) - f{w)} 

= nhif'iW) - f'{W'))} + E{(1 - Ii){f'{W) - f'{W'))} =: J1 + J2. 

Using (1121), IJ2I ^ F[|A| > f3]. Now, using (jilB and in the last step 
Lemma 14. 3[ 

Ji = ]E!^h j%"iW + t)dt^ 

= e|/i ^ {f'{W + t)- e~H[a - e i^W + t ^a])dt^ 

f-O 

^ E|/iAl +/ F[a-e^l^ + i^ a]dt ^ 2/3 + 2f3e-^n. 
J-0 



{lem2} 



{lem3} 
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Similarly, 

Ji ^ -E|/iA| -/ F[a-e^Ty + t^ a]dt ^ -2/3 - ipe'^K, 
Jo 

hence |Ji| ^ 2/3 + 2/?e~^K. Using Lemma 14.21 and choosing e = 4/3, 

K ^ e + F[|A| > /3] + 2/3 + 2Pe^^K ^ F[|A| > /3] + 6/? + 0.5k. 

Solving for k proves (|2.5p . 
To obtain ([221), write 



= E{/i(/(VF) - /(W^^))} + E{(1 - Ii){f{W) - fiW^))} 
Hence, using Taylor's expansion along with the bounds (|4.7p for e = 0, 
m'iW^) - fiW^)\ ^ ||/'||E|/iA| + F[|A| >P]^P + F[|A| > /3], 
which gives (|2.6|) . 

Assume now in addition that W has finite variance so that has finite 
mean. Then 

\wm - f{w)}\ = \iE{f{w) - f'iw')}\ ^ iiriiE|A| 

From the bound (f2Jl) follows. Also, 

Hf'{w-)-f{w')}\^\\f'mA\ 

which yields ()2.8p from (14. 7p with e = 0; the remark after (12. 8p follows 
from KB. □ 



Proo/ TheoremlMM Let / be the solution (g^]) to (gSI), hence /(O) = 0, 
and assume that / is Lipschitz. Prom the fundamental theorem of calculus 
we have 



f{w')-fiw)= r f'{w+t)dt. 

Jo 



Multiplying both sides by G and comparing it with the left hand side of 
^21 ) we have 

f'{W) - f{W) = GfiW) - Gf{W) - f{W) 
+ (1 - GD)f'{W") 
+ {l-GD){f'{W)-f'{W")) 

-G [ {f'{W + t)-f'iW))dt. 
Jo 
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Note that we can take expectation component-wise due to the moment as- 
sumptions. Hence, 



m{w) - m{z) = + i?2(/) + Rsif) - Riif) 



where 



Riif) = lE{GfiW') - Gf{W) - fiW)}, 

R2if) = ¥.{{l-GD)f'{W")}, 

Rsif) = E{(1 - GD){f\W) - nw"))}, 

R,if) = ls!^G (f'{W + t)-nW))dty 

Assume now that h G J^bw and / the solution to (|4.2|) . Then from (|4.4|) 
and (112]) 

^1, ll/'IKi, ll/"IK2. 



Hence, / € ^bw and 

\Ri{f)\^n{TBw), \R2{f)\^r2. 

Furthermore, 

\R3{f)\ - GD){f'{W") - f'{W))\ 

^ 2E{|1 - GD\l[\D'\ > 1]} + 2E{|1 - GD\(\D'\ A 1)} 

= H + 2rl, 



and 



-D 

|i?4(/)KE' 



g[ {f'{W + t)-f'{W))dt 
Jo 

^ 2E|G-DI[|L>| > 1]\ + 2]E\G{D^ A 1)| 
= 2r3 + 2r4. 

This yields the c^bw results. Let now h G .Fw and / the solution to (|4.2p . 
Then, from and 

ii/'iKi, iiriK2, 

hence the bounds on R2{f), Rsif) and R^^f) remain, whereas now / G .7^ 
and, thus, |i2i(/)| ^ ri{Tw)- This proves the dw estimate. 

Let now / be the solution to (14. 2p with respect to /ia,e as in (14. 6p . Then, 
from (|^ 

^1, ll/'Ki, 
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hence / € ^bw, |«i(/)| ^ ri(^Bw) and |i?2(/)| ^ rg. Let h = l[\G\ ^ 
a,\D\ ^ P',\D'\ ^ /?']. Write 

R-sif) = E{(1 - - GD){f'iW") - f'iW))} 

+ E{/i(l - GD){f'{W") - f'{W))} =: Ji + J2. 



Using (|4.7p . |Ji| ^ is immediate. Using (|4.9p and Lemma HI 
IJ2I ^ E|(GD - l)Ii{f'{W") - fiW)) 



^ {ap + + (a/3 + / F[o-e^P^ + n^ a]du 

J-13' 

^ (a/3 + l)/3' + (a/3 + / (e + 2K)(iu 

= 3{aP + l)/3' + 4(a/3 + l)/3'e~^K. 
Similarly, let /2 = I[|G| ^ a, < /3] and write 

R^if) = e|g(1 - /2) l^^ifiW + t)- f'{W))dt^ 

+ e|g/2 j\f'{W + t)- f'{W))dt^ =: J3 + J4. 

By ()4.7p . IJ3I ^ rs- Using again (|4.9p and Lemma 1131 

|J4|^EG/2/ \f'{W + t)-f'{W)\dt\ 

f /■'^ r /"^^^ 1 1 

^ aE<^ / {\t\ A 1) + e"M I[a - e ^ 14^ + u ^ a]du dt \ 

C 1-13 i-tVO ^ 

^a/32 + ae~^EM / {e + 2n)dudt\ = 2a(3^ + 2a(3'^e-^K. 

[J-lsJtAO J 



Using Lemma 14.21 and collecting the bounds above we obtain 

K^e + ni^Bw) + ^2 + I Jll + I J2I + I J3I + \J4\ 

f^e + ri (Tbw) + ^2 + rs + + 3(a/3 + l)/3' + 2a(5'^ 
+ (4(a/3 + l)/3' + 2a/32)e-iK. 

so that, setting e = 8(a/3 + 1)/?' + 4a/32, 

K s; e + ri(J'Bw) + ?-2 + + rg + ll(a/3 + l)/3' + 6a/3^ + 0.5k. 

Solving for k yields the final bound. □ 
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